230 research outputs found

    Dynamic Range Partitioning in Multiprocessor Database Implementations

    Get PDF
    Multiprocessor implementation of the relational database operators has recently received great attention in literature [1-4, 8, 11]. As the complexity of implementing the relational operators rests on the inter-node communication patterns involved in an operation, greater research attention has been focused on Join algorithms. The Join traffic patterns subsume those of the remaining relational operators. To effectively exploit parallelism in bucket based join implementations, the domain of the joining attributes must be partitioned into equal subranges. That is, the processing of each subrange requires roughly the same amount of time. A skewed distribution of workload significantly hinders performance. As relations exhibit a non-uniform attribute value distribution, possibly resulting from a previous operation, a priori determination of subrange boundary conditions results in a non-balanced workload across the processors. Performance degradation in parallel systems employing such static boundary subrange partitioning is demonstrated in Lakshmi and Yu [6]. That study exemplified that even a low degree of attribute skew results in a significant performance penalty. This paper proposes a statistical algorithm for dynamic determination of domain partitioning in bucket based join implementations. This statistics-based approach guarantees a near-uniform processor workload. A parameterization of the sample size versus the number of tuples is developed, and a proof of the validity of the approach is discussed. A simple illustrative example is presented

    Historical Document Enhancement Using LUT Classification

    Get PDF
    The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). © Springer-Verlag 2009

    Efficient MRF approach to document image enhancement

    Get PDF
    Markov random field (MRF) based approaches have been shown to perform well in a wide range of applications. Due to the iterative nature of the algorithm, the computational cost of such applications is normally high. In the context of document image analysis, where numerous documents have to be processed, this computational cost may become prohibitive. We describe a novel approach to document image enhancement using MRF. We show that by using domain specific knowledge, we are able to substantially improve computational performance by an order of magnitude. Moreover, in contrast to known techniques where patch initialization is arbitrary, in the proposed approach patch initialization is data consistent and so results in improved effectiveness. Experimental results comparing the proposed approach to known techniques using historical documents from the Frieder Collection are provided. © 2008 IEEE

    Ad Hoc Multi-Input Functional Encryption

    Get PDF
    Consider sources that supply sensitive data to an aggregator. Standard encryption only hides the data from eavesdroppers, but using specialized encryption one can hope to hide the data (to the extent possible) from the aggregator itself. For flexibility and security, we envision schemes that allow sources to supply encrypted data, such that at any point a dynamically-chosen subset of sources can allow an agreed-upon joint function of their data to be computed by the aggregator. A primitive called multi-input functional encryption (MIFE), due to Goldwasser et al. (EUROCRYPT 2014), comes close, but has two main limitations: - it requires trust in a third party, who is able to decrypt all the data, and - it requires function arity to be fixed at setup time and to be equal to the number of parties. To drop these limitations, we introduce a new notion of ad hoc MIFE. In our setting, each source generates its own public key and issues individual, function-specific secret keys to an aggregator. For successful decryption, an aggregator must obtain a separate key from each source whose ciphertext is being computed upon. The aggregator could obtain multiple such secret-keys from a user corresponding to functions of varying arity. For this primitive, we obtain the following results: - We show that standard MIFE for general functions can be bootstrapped to ad hoc MIFE for free, i.e. without making any additional assumption. - We provide a direct construction of ad hoc MIFE for the inner product functionality based on the Learning with Errors (LWE) assumption. This yields the first construction of this natural primitive based on a standard assumption. At a technical level, our results are obtained by combining standard MIFE schemes and two-round secure multiparty computation (MPC) protocols in novel ways highlighting an interesting interplay between MIFE and two-round MPC

    Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

    Full text link
    Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.Comment: Accepted at SIGIR 2020 (long

    Blake, Charles

    Get PDF
    Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm

    Characterizing Question Facets for Complex Answer Retrieval

    Get PDF
    Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.Comment: 4 pages; SIGIR 2018 Short Pape
    • …
    corecore